World-Wide Web scaling exponent from Simon's 1955 model 
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Recently, statistical properties of the World-Wide Web have attracted considerable attention when 
self-similar regimes have been observed in the scaling of its link structure. Here we recall a classical 
model for general scaling phenomena and argue that it offers an explanation for the World-Wide 
Web's scaling exponent when combined with a recent measurement of internet growth. 


A quantity important for searching the World-Wide 
Web is the number k of links that point to a partic- 
ular web page. Its probability distribution P(k) exhibits 
power-law scaling |2|j^ -P(fc) ^ that is not readily 
explained by standard random graph theory Q . An ele- 
gant model for scaling in copy and growth processes was 
proposed by Simon Q in 1955 which describes scaling be- 
haviour as observed in distributions of word frequencies 
in texts or population figures of cities It models the 
dynamics of a system of elements with associated coun- 
ters (e.g., words and their frequencies in texts, or nodes 
in a network and their connectivity k) where the dynam- 
ics of the system is based on constant growth via addition 
of new elements (new instances of words) as well as in- 
crementing the counters (new occurrences of a word) at 
a rate proportional to their current values. 

Reformulating this to model network growth consider 
a network with n nodes with connectivities fc^, j = 1 . . . n, 
forming classes [k] of f{k) nodes with identical connec- 
tivity k. Iterate the following steps: 

(i) With probability a add a new node and attach a 
link to it from an arbitrarily chosen node. 

(ii) Else add one link from an arbitrary node to a node 
j of class [k] chosen with probability 


-^ncw link to class [k] ^ ^J'(^)- 


(1) 


For this stochastic process, Simon finds a stationary 
solution exhibiting power-law scaling with exponent 


7=1 


1 


1 


(2) 


The only free parameter of the model a reflects the rel- 
ative growth of number of nodes versus number of links. 
In general small values of a, therefore, predict scaling 
exponents near 7 « 2. 

Let us apply this process to model the evolution of 
the World-Wide Web, identifying nodes with web pages. 
Data from two recent comprehensive Altavista crawls |^] 
provide an estimate for a in the present internet. These 
two measurements counted 203 million pages and 1466 
million links in May 1999, and 271 million pages and 2130 
million links in October 1999. The probability for adding 
a new web page is estimated from the observed increase 
in counts to a ~ 0.10. The subsequent prediction of 
Simon's model for the exponent of the link distribution 
is 7 = 2.1 comparing well to current experimental results 
7 = 2.1 ± 0.1 § and 7 = 2.09 @. 

To compare with recently proposed models it may be 
interesting to note that the model by Barabasi and Albert 


[H can be mapped to the subclass a = 1/2 of Simon's 
model, when using the simpler probability for a node 
being connected to another node i with connectivity fcj 


Rncw link to i ^ ki- 


(3) 


Note that (^ implies (l|) whereas the reverse is not true. 
Otherwise both models are based on the same two as- 
sumptions of growth and preferential linking. From this 
viewpoint, it is insightful to reconsider a recent discus- 
sion of their model. Adamic and Huberman point out 
that the "rich-get-richer" behaviour of single nodes im- 
posed by (^) correlates age and connectivity of nodes . 
This, however, is disproven by data they present. They 
suggest to (and Barabasi et al. in response show how to 
Q) add individual growth rates to each node. While 
this solves the correlation problem, the price to pay is a 
large number of free parameters in the extended model. 
A simple solution to this problem has already been pro- 
vided by Simon: Linking is guided by (^ instead of (|^), 
considering not single nodes but classes of nodes with 
identical connectivities. This allows for different growth 
rates among class members, leaving just one free param- 
eter. Above we determine this parameter from experi- 
mental data, enabling Simon's classical scaling model to 
estimate the connectivity exponent of the World-Wide 
Web to 7 = 2.1. 
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